This notebook is an exploration of land size distributions in household survey data. I begin by assessing the data we have available, looking specifically at number of samples, spatial coverage and completeness. I then explore the data on land-size. I explain why land size might be an interesting variable to explore. Finally I delve into the distributions of land size in different locations.

Primary Questions

Assessing Data Coverage and Completeness

We are interested in the distribution of farm characteristics at the subnational level. First we need to check how much data we have in different subnational areas, and whether we potentially have sufficient data per area to plot a distribution.

For this study, we are interested in areas where we have household survey data and external datasets (census, climate, and land coverage). There are 9 countries which fulfill these criteria.

If we look at the number of surveys per country, there are three candidates from the datasets which have helpful characteristics:

  1. They have a relatively large number of surveys (n>2000)
  2. They have surveys in a significant portion of their subnational areas (>40%)

The three countries are Burkina Faso (West Africa), Rwanda (Central Africa), and Tanzania (East Africa).

Why Land Size

In this study, we are interested in looking at Land Size. Land size is an interesting variable to look at for smallholder farmers. It can tell us a lot about other variables (some of which are more error prone, or more difficult to measure e.g. income and food security).

If we plot land size directly against income and food security, it can be difficult to see the relationship due to outliers and high variation. However if we look at quantiles and distributions, we can begin to see that there might be some sort of association. We see that there might be some relationship between land size and total income, food security, and livestock holdings.

Interestingly we see that the spread varies for different quantiles. For example, for higher land sizes, we see a larger spread in income values.

Why Map the Spread of Land Sizes

In mapping efforts, we often see researchers trying to map averages. For example, in the Lowder article, they mapped average farm size per subnational unit using census information, and information on the total arable land.

In different areas however, there is a large variation in land sizes. Here we see that we have a wide range of land size distributions, each of which vary quite significantly by country.

If we were to use land size to prioritise development interventions, it would be important to account for the characteristics of land size distributions.

In the plots below, we can see there appears to be an association between average (mean or median) land cultivated, and the spread of land cultivated (sd or IQR). This relationship appears to be heteroskedastic. For areas with larger average cultivated land, there is also a greater spread (i.e. different households cultivating different land sizes).

We also see that for most subnational areas, Land Size distributions are skewed, and in many of these areas the distributions are fat-tailed (normal distribution has kurtosis of ~ 3).

What do these distributions this mean in practice. Lets say we are using Land Size to target development interventions towards the poorest of the poor.

We could target interventions based on mean land cultivated. i.e, we make the assumption lower mean land cultivated means more people with smaller farms. People with smaller farms have lower incomes or food security status.

Let us see what this would look like with the data we have. If we were to take a threshold value, say we are interested in households farming less that 2 hectares. What we would see is that the spread can vary quite significantly, depending on the location we are looking at. For example, in areas where the mean land cultivated is about 1.8ha, the proportion of households cultivating less than 2ha can vary from about 35% to 100%.

In large scale surveys, it is much more difficult to see this type of small-scale variation, as often only a few households are sampled in these subnational areas.

Covariates

It is clear from an initial exploration that there is more going on at the subnational level than variation in the mean. In this study we want to see if we can try to predict this this variation.

To predict characteristics of subnational heterogneity, we need data at the subnational level.

For this study we have aggregated census data (GEO2 level), climate data, and land-cover information. Here are the variables we have

We don’t have all of these variables for all of these subnational levels. So let’s see what we do have

From this assessment of data completeness, it seems that covariates are not available in the Nigeria data. There is one region in Kenya where covariates are not available.

There is missing data on Agricultural Employment in Nigeria, Kenya, Burkina Faso, and Ethiopia. Missing data on “telephone” in Nigeria, Kenya, and Mali. Missing data on “toilet” in Kenya and Nigeria.

Modelling Location Scale and Shape

In distributional models,

Notes

This exploration is a step towards developing procedures to map smallholder heterogeneity. I do not want to imply that mapping land size is the only thing we need to do for targetting the “poorest of the poor”.

Subnational Level Covariates